{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6c055ef3",
   "metadata": {
    "origin_pos": 0
   },
   "source": [
    "# Modern Convolutional Neural Networks\n",
    ":label:`chap_modern_cnn`\n",
    "\n",
    "Now that we understand the basics of wiring together CNNs, let's take\n",
    "a tour of modern CNN architectures. This tour is, by\n",
    "necessity, incomplete, thanks to the plethora of exciting new designs\n",
    "being added. Their importance derives from the fact that not only can\n",
    "they be used directly for vision tasks, but they also serve as basic\n",
    "feature generators for more advanced tasks such as tracking\n",
    ":cite:`Zhang.Sun.Jiang.ea.2021`, segmentation :cite:`Long.Shelhamer.Darrell.2015`, object\n",
    "detection :cite:`Redmon.Farhadi.2018`, or style transformation\n",
    ":cite:`Gatys.Ecker.Bethge.2016`.  In this chapter, most sections\n",
    "correspond to a significant CNN architecture that was at some point\n",
    "(or currently) the base model upon which many research projects and\n",
    "deployed systems were built.  Each of these networks was briefly a\n",
    "dominant architecture and many were winners or runners-up in the\n",
    "[ImageNet competition](https://www.image-net.org/challenges/LSVRC/)\n",
    "which has served as a barometer of progress on supervised learning in\n",
    "computer vision since 2010. It is only recently that Transformers have begun\n",
    "to displace CNNs, starting with :citet:`Dosovitskiy.Beyer.Kolesnikov.ea.2021` and \n",
    "followed by the Swin Transformer :cite:`liu2021swin`. We will cover this development later \n",
    "in :numref:`chap_attention-and-transformers`. \n",
    "\n",
    "While the idea of *deep* neural networks is quite simple (stack\n",
    "together a bunch of layers), performance can vary wildly across\n",
    "architectures and hyperparameter choices.  The neural networks\n",
    "described in this chapter are the product of intuition, a few\n",
    "mathematical insights, and a lot of trial and error.  We present these\n",
    "models in chronological order, partly to convey a sense of the history\n",
    "so that you can form your own intuitions about where the field is\n",
    "heading and perhaps develop your own architectures.  For instance,\n",
    "batch normalization and residual connections described in this chapter\n",
    "have offered two popular ideas for training and designing deep models,\n",
    "both of which have since also been applied to architectures beyond computer\n",
    "vision.\n",
    "\n",
    "We begin our tour of modern CNNs with AlexNet :cite:`Krizhevsky.Sutskever.Hinton.2012`,\n",
    "the first large-scale network deployed to beat conventional computer\n",
    "vision methods on a large-scale vision challenge; the VGG network\n",
    ":cite:`Simonyan.Zisserman.2014`, which makes use of a number of\n",
    "repeating blocks of elements; the network in network (NiN) that\n",
    "convolves whole neural networks patch-wise over inputs\n",
    ":cite:`Lin.Chen.Yan.2013`; GoogLeNet that uses networks with\n",
    "multi-branch convolutions :cite:`Szegedy.Liu.Jia.ea.2015`; the residual\n",
    "network (ResNet) :cite:`He.Zhang.Ren.ea.2016`, which remains one of\n",
    "the most popular off-the-shelf architectures in computer vision;\n",
    "ResNeXt blocks :cite:`Xie.Girshick.Dollar.ea.2017`\n",
    "for sparser connections;\n",
    "and DenseNet\n",
    ":cite:`Huang.Liu.Van-Der-Maaten.ea.2017` for a generalization of the\n",
    "residual architecture. Over time many special optimizations for efficient \n",
    "networks have been developed, such as coordinate shifts (ShiftNet) :cite:`wu2018shift`. This \n",
    "culminated in the automatic search for efficient architectures such as \n",
    "MobileNet v3 :cite:`Howard.Sandler.Chu.ea.2019`. It also includes the \n",
    "semi-automatic design exploration of :citet:`Radosavovic.Kosaraju.Girshick.ea.2020`\n",
    "that led to the RegNetX/Y which we will discuss later in this chapter. \n",
    "The work is instructive insofar as it offers a path for marrying brute force computation with \n",
    "the ingenuity of an experimenter in the search for efficient design spaces. Of note is\n",
    "also the work of :citet:`liu2022convnet` as it shows that training techniques (e.g., optimizers, data augmentation, and regularization)\n",
    "play a pivotal role in improving accuracy. It also shows that long-held assumptions, such as \n",
    "the size of a convolution window, may need to be revisited, given the increase in \n",
    "computation and data. We will cover this and many more questions in due course throughout this chapter.\n",
    "\n",
    ":begin_tab:toc\n",
    " - [alexnet](alexnet.ipynb)\n",
    " - [vgg](vgg.ipynb)\n",
    " - [nin](nin.ipynb)\n",
    " - [googlenet](googlenet.ipynb)\n",
    " - [batch-norm](batch-norm.ipynb)\n",
    " - [resnet](resnet.ipynb)\n",
    " - [densenet](densenet.ipynb)\n",
    " - [cnn-design](cnn-design.ipynb)\n",
    ":end_tab:\n"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "required_libs": []
 },
 "nbformat": 4,
 "nbformat_minor": 5
}